Comparative transcriptomic analysis of genes in the triterpene saponin biosynthesis pathway in leaves and roots of Ardisia kteniophylla A. DC., a plant used in traditional Chinese medicine

Abstract Ardisia kteniophylla (Primulaceae) is highly valued in traditional medicine due to its production of the pharmacologically active secondary metabolites, especially triterpenoid saponins in its roots. Although A. kteniophylla is very important in traditional medicine, the genetic basis for its production of triterpenoid saponins remains largely unknown. Therefore, we sequenced transcriptomes of A. kteniophylla to identify putative genes involved in production of triterpenoid saponins in both leaves and roots, and we used the transcriptomes to compare expression levels of these genes between the two organ systems. The production of triterpenoid saponins in plants is usually induced through hormonal signaling on account of the presence of pests. Thus, we treated plants with the hormones salicylic acid (SA) and methyl jasmonate (MeJA) and used quantitative real‐time PCR (qRT‐PCR) to investigate expression levels of genes involved in triterpenoid saponin biosynthesis. In total, we obtained transcriptomes for leaf and root tissues representing 52,454 unigenes. Compared with the leaf transcriptome, we found that 6092 unigenes were upregulated in the root, especially enzymes involved in the direct synthesis of triterpenoid saponins, while 6001 genes appeared downregulated, including those involved in precursory steps in the triterpenoid saponin biosynthesis pathway. Our results from qRT‐PCR indicate that genes within the upstream parts of the triterpenoid saponin biosynthesis pathway may be upregulated under exposure to the applied hormones, but downstream genes are downregulated. This suggests possible conflicting effects of SA and MeJA in promoting the production of secondary metabolites on the one hand, and, on the other, limiting plant growth processes to devote energy to combating pests. We also performed an analysis of transcription factors (TFs) and found 997 unique transcripts belonging to 16 TF families. Our data may help to facilitate future work on triterpene saponins biosynthesis in A. kteniophylla with potential pharmacological and molecular breeding applications.


| INTRODUC TI ON
Ardisia kteniophylla A. DC. is a perennial plant within the Primulaceae family (Ericales; Huang et al., 2017), and its roots are widely known to accumulate pharmacologically active products such as saponins, benzoquinones, flavanones, and sterols (Sun, Li, et al., 2017;Sun, Jiang, et al., 2017;Zhou, 2017). These products likely form the biochemical basis for the utility of A. kteniophylla in traditional Chinese medicine. In particular, the dried root of A. kteniophylla is used in southern China for the treatment of rheumatism, muscle and bone pain, and traumatic injury (Dai et al., 2017;Tang, 2007;Xiang & Feng, 2002). The medicinal value of A. kteniophylla has driven annually increasing demand for the species and has, thus, placed pressure on wild resources, which can no longer adequately supply the markets.
Wild populations are also threatened by large-scale illegal mining, which has resulted in considerable habitat loss. Therefore, unfortunately, A. kteniophylla is on the verge of extinction within China and globally due to both overharvesting and habitat destruction (Wei et al., 2015). While its conservation may be achievable based on comprehensive assessments of its genetic recourses, so far, genetic data for A. kteniophylla are lacking.
The most pharmacologically significant bioactive compounds produced by A. kteniophylla are triterpenoid saponins (Mu et al., 2010), which consist of triterpene sapogenin, sugars, uronic acid, and other organic acids. Broadly, terpenes, such as triterpenoid saponins, are found in many diverse medicinal plant species (e.g., Dong et al., 2021). Triterpene saponins have a wide range of pharmacological applications as anti-cancer, anti-inflammatory, anti-allergic, and anti-viral agents and have value for the treatment of leukemia and hypoglycemia, as well as for the prevention and treatment of cardiovascular and cerebrovascular diseases (Fleck et al., 2006;Ma et al., 2007;Ponou et al., 2008;Sparg et al., 2004;Sun et al., 2004Sun et al., , 2009Vermeersch et al., 2009;Yan et al., 2006). Among saponins in A. kteniophylla, one has shown significant inhibitory effects on six different lines of tumor cells (Gu et al., 2014). Broadly, the Ardisia genus appears to represent a rich source of triterpene saponins (Kobayashi & De Mejía, 2005;Su et al., 2003;Zhang, 1994).
In general, the biosynthesis of saponins is well-characterized and primarily involves three major classes of enzymes. These are oxidosqualene cyclases (OSCs), uridin diphosphate glycosyltransferases (UGTs), and cytochrome P450 monooxygenases (CYP450s; Sawai & Saito, 2011). Respectively, these classes of enzymes construct the basic triterpenoid structures, or skeletons, facilitate oxidation reactions, and catalyze the attachment of carbohydrates to a hydroxyl or similar functional groups of other molecules (i.e., glycosylation; Sawai & Saito, 2011).
In plants, saponins are part of a chemically diverse array of secondary metabolites that function in defense against microbes, diseases, and other pests (Papadopoulou et al., 1999). Plant defenses are stimulated through environmental interactions that induce signal transduction, which is often carried out by plant hormones. Among plant hormones, jasmonic acid, its methyl ester, methyl jasmonate (MeJA), and salicylic acid (SA) are endogenous and involved in signal transduction related to defense and, specifically, the production of secondary metabolites (Wang et al., 2015). When SA and MeJA are applied exogenously experimentally, they have been shown to induce upregulation of genes involved in triterpenoid saponin biosynthesis (Cao et al., 2015;Chang et al., 2016).
Unfortunately, the knowledge of genetic and hormonal regulation of biosynthesis of triterpenoid saponins within many medicinal plants, such as A. kteniophylla, is constrained by lack of genetic resources, including functionally characterized gene sequences.
Characterization and new discoveries of functional genes can be expedited for medicinal plants (and other non-model plant species) using transcriptome sequencing and de novo assembly (Liu et al., 2018;Minoche et al., 2011;Sangwan et al., 2008).
In this study, we sought to elucidate the genomic basis for the biosynthesis of triterpenoid saponins in A. kteniophylla.
Specifically, our objectives were to (1) detect and characterize genes involved in biosynthesis of triterpenoid saponins using transcriptomic data resulting from RNA sequencing (RNA-Seq), (2) compare the expression levels of genes involved in triterpenoid saponin biosynthesis in leaves and roots of A. kteniophylla to better understand the genomic mechanisms underlying disparity in accumulation of these compounds between these two organ systems, and (3) investigate differential levels of transcription of triterpenoid saponin biosynthesis under exposure to the plant hormones, SA and MeJA. Additionally, we assessed the transcription factor families represented by the transcriptomes that we sequenced from roots and leaves to infer possible mechanisms of upstream regulation of the biosynthesis of triterpene saponins in A. kteniophylla.

| Plant materials
We obtained seeds of A. kteniophylla from a group of plants of similar age and height from a nursery in Nanxiong, Guangdong, China. We grew the seeds on mixed soil (coconut bran: perlite: and Y936111001) and the Science and Technology Program of Guangdong Province, China (2019B1212006) made to Hongfeng Chen

K E Y W O R D S
DEGs, Illumina sequencing, Primulaceae, transcription factors, triterpenoid saponins

T A X O N O M Y C L A S S I F I C A T I O N
Botany; Genomics peat soil = 1:1:1) in a greenhouse at South China Botanical Garden, Guangzhou, Guangdong, China at 20-22°C, which is typical of the average annual temperature in the natural habitat of the species.
Based on the preference of A. kteniophylla for dark, humid environments (Wei, 2018), we set the greenhouse conditions at 70%-80% humidity and 10%-20% ambient occlusion at all hours of the day.
For optimal growth conditions, we sprayed the plants with water three times daily.
After two years of growth, we divided the seedlings of A. kteniophylla into SA and MeJA treatment groups and a control group (CK) with six plants in each group. We applied 1 mmol/L concentration of SA or MeJA to leaves of seedlings in the respective treatment groups by spraying until droplets on the leaves were dripping off, and we sprayed leaves of the CK plants with water based on protocols suggested in Werner and Schmülling (2009). All treatments occurred at 8 a.m., 12 a.m., and 6 p.m. daily for a total of seven days based on protocols successfully used for Ardisia crenata in a prior study (Yang, 2015).
We selected the concentrations of SA and MeJA and timings of measurements based on prior studies of the medicinal plants, especially Panax ginseng, Ardisia crenata, and Astragalus mongholicus (Cao et al., 2015;Chang et al., 2016;Yang, 2015). For example, the study on Panax ginseng (Cao et al., 2015) examined the content of medicinally important ginsenosides and found that 1 mmol/L concentration of SA was effective for their yield and quality. Similarly, a study on Astragalus membranaceus found that 1 mmol/L of MeJA was optimal for the species to yield saponins (Chang et al., 2016). Based on these prior findings, we performed a preliminary experiment in which we treated six seedlings of A. kteniophylla with 1 mmol/L of MeJA or SA and assessed triterpenoid saponin yield. The preliminary results supported the suitability of application of 1 mmol/L for our experiments.
Following seven days of treatments, we selected three healthy plants out of six from the CK and each treatment group (representing three biological replicates) for sampling of leaves and roots for quantitative real-time PCR (qRT-PCR) and spectrophotometric analysis. We performed both the qRT-PCR and spectrophotometric analyses on the same plants. Additionally, we sampled leaves and roots for RNA extraction and subsequent Illumina sequencing from the same three CK plants. After sampling leaf and root tissues for RNA extraction, we immediately stored the samples in liquid nitrogen and kept them at −80℃ until processing.

| RNA extraction and verification of differences in triterpene saponin concentrations in leaves and roots
We used spectrophotometry to verify that there was a difference in triterpenoid saponin concentrations between leaves and roots in A. kteniophylla sufficient to merit further downstream analyses. To prepare tissues for spectrophotometry, we first homogenized 0.1 g of each tissue sample, placed 1.0 ml of homogenate into 20 ml test tubes, and dried the samples using compressed air. We also generated a control comprising 1.0 ml of distilled water in a 20 ml tube.
After the samples were dry, we performed a vanillin-perchloric acid assay (Hiai et al., 1976) to detect the presence of triterpenoid saponins. Specifically, we added 0.2 ml of 5.0% vanillin-glacial acetic acid solution and 1.0 ml of perchloric acid to the tissue and vortexed to mix. Thereafter, we placed the samples in a water bath at 80°C for ten minutes followed by an ice bath for five minutes before adding 8 ml of glacial acetic acid. We used a Specord210Plus spectrometer (Analytikjena) to measure absorbance of the resulting mixture at 550 nm, representing the absorption peak for oleanolic acid, which has the strongest peak among triterpenoid saponins (Hiai et al., 1976). We calculated the concentration of triterpene saponins using the following equation: where C is the mass of triterpenoid saponins in mg/ml calculated according to the absorbance and the standard curve, V is the volume of the extract, and W is the mass of the sample.
To evaluate differences in the concentrations of triterpenoid saponins among the CK and two experimental groups, we used SPSS22.0 to perform analyses of variance (ANOVAs), and we performed t-tests for assessing the differences within groups.
In order to generate a standard curve for comparison with the empirical results, we used graduated amounts of oleanolic acid standard solution (1, 2, 3, 4, and 5 ml) and carried out the same color reaction. Absorbance was measured at 550 nm, and a linear regression was performed on the absorbance value A with concentration C (mg/ml).
For RNA extraction, we used 50 mg of tissue with a Huayueyang Quick RNA Isolation Kit according to the manufacturer's protocol. Following extraction, we assessed the quality, purity, and integrity of the total RNA using electrophoresis on a 1.0% (w/V) gel, a NanoPhotometer spectrophotometer, and an Agilent 2100 bioanalyzer. We required that extracted RNAs Content of triterpenoid saponins (mg∕g) = (C * V)∕W,

Database
Software and non-default parameters

| Transcriptome assembly and annotation
We processed the raw reads and obtained high-quality clean sequences using Illumina Casava 1.8 (Richter & Sexton, 2009).
Specifically, we removed the adapter sequences as well as reads with ambiguity (i.e., "Ns") and/or Phred scores of ≤20 for more than 50% of bases. We performed de novo assembly of the clean, high-quality reads using Trinity (Grabherr et al., 2011) under default settings. Trinity comprises several steps, including a finishing step, "Butterfly," which determines the relative number of reads supporting each path along a de Bruijn graph to disentangle paralogs that were considered to represent a single gene in prior steps (Grabherr et al., 2011). In the earliest version of Trinity, this approached successfully disentangled all paralogs from 43% of multi-gene families and distinguished two or more paralogs from 85% of multi-gene families (Grabherr et al., 2011). Trinity is also known to outperform other assembly algorithms when paralogy is expected due to polyploidy (Chopra et al., 2014). Thus, while we cannot rule out (and fully expect) merger, or collapse, of some closely related paralogs within multi-gene families in our assemblies, we believe that Trinity represents one of the most robust tools available for de novo assembly at present. Blastn, or database-specific tools (Conesa et al., 2005). We identified transcription factor (TF) families using iTAK (Zheng et al., 2016), which applies hmmscan (Bilmes, 2006) to compare gene sequences to a profile comprising a set of rules (Jin et al., 2014;Pérez-Rodríguez et al., 2010), in this case, about allowed and disallowed domains in plant TFs based on annotated sequences from several databases.

| Analysis of differentially expressed genes based on transcriptomes and qRT-PCR
We used RSEM (Li & Dewey, 2011) to align the clean reads of each sample to the reference assembly from Trinity and determined the number of reads, or read count, aligned to each gene. We used the average of read counts from the three samples of leaves and roots to infer Foldchange.
We determined differentially expressed genes (DEGs) between the leaves and roots of three samples of A. kteniophylla representing the CK group using the DESeq package (1.10.1) in R (Love et al., 2014). DESeq infers DEGs from reads based on read count using a negative binomial distribution with mean and variance inferred via local regression. From the outcome of DESeq, we adjusted the resulting p-values using the Benjamini and Hochberg's correction (Benjamini & Hochberg, 2000). We regarded genes found by DESeq with an adjusted log2Foldchange > 1 and p-value < .05 as reliable DEGs between the two types of tissues. To visualize and compare foldchange, we used the R package, DEGseq R package (Wang, Feng, et al., 2010;Wang, Luan, et al., 2010), to generate a volcano plot (Li, 2012), which, broadly, is a scatter plot showing the magnitude of change (x-axis) versus statistical significance (y-axis).
We also compared DEGs within upstream and downstream parts of the triterpenoid saponin biosynthesis pathway between Primers were developed for this study based on the coding sequences of the genes (Table S1). As a standard, we used Beta-Actin (β-ACTIN; a "housekeeping" gene) and calculated its relative expression between the two types of tissue using 2 −ΔΔC t (Livak & Schmittgen, 2001).

| Determination of the content of triterpene saponins
Within both treatment groups and the CK, the content of triterpene saponins in root tissues of A. kteniophylla was much higher than in leaf tissues ( Figure 1) based on spectrophotometry. In both the leaves and roots, the CK bore significantly higher concentrations of triterpene saponins than either of the treatment groups (α = 0.05).
However, we observed only negligible differences in concentrations of triterpene saponins between the two treatment groups for the leaves and roots.

| Functional annotation and classification
We were able to annotate all 52,454 unigenes using at least one source among the NCBI non-redundant protein (nr) and nucleotide sequences (nt) databases, Pfam, KOG, Swiss-Prot, KEGG, and the Gene Ontology (GO) database ( Table 3). This represents a 100% success rate in annotation of unigenes. In the case of 5808 unigenes, we performed successful annotation using all databases.

| Differential expression analysis
We identified DEGs between leaves and roots of A. kteniophylla in DESeq (Table S2)

| Genes in the up-and downstream pathways of triterpene saponin biosynthesis
We found that DEGs involved in the upstream MVA pathway, fold change ≥1, respectively. Overall, more DEGs in the downstream pathway were more highly expressed in roots compared with leaves (Table S4). None of the predicted CYP450s or UTGs were annotated to the KEGG pathways for triterpenoid biosynthesis (Figure 8), and this may be because the functions of these genes are still broadly unknown and are not presently included in the focal pathways by KEGG.

| Quantitative reverse transcriptase PCR
Based on the DEGs inferred from transcriptome sequences, we selected the following genes for qRT-PCR: two AkPMD unigenes, one AkHDS unigene, one AkSS unigene, two AkSM unigenes, five AkCYP450s unigenes, and four AkUGTs unigenes. We found that all F I G U R E 5 Volcano plot showing differentially expressed genes in roots of Ardisia kteniophylla compared with leaves. The x-axis indicates the fold change in expression for different genes, and the y-axis indicates the significance level of the difference in expression. The blue dashed line represents the threshold for determining that differential expression is significant genes analyzed using qRT-PCR were upregulated in the roots of the CK plants ( Figure 9) as is generally consistent with the DEGs inferred from the transcriptomes (Table S1).

| Prediction of transcription factors
In total, we annotated 997 transcription factors in A. kteniophylla, and, of these, 590 were upregulated in the root and 401 were upregulated in leaves ( Table 4 and Table S5). In particular, members of the Myeloblastosis (MYB) and Apetala 2/Ethylene Response Factor (AP2/ERF) TF families were the most highly upregulated in roots with 95 and 76 members, respectively. Notably, MYB was also the most highly upregulated in leaves with 65 upregulated members. Overall, we found that the 1865 potential TFs accounted for 3.56% of the A. kteniophylla unigene library.

| Triterpenoid saponin biosynthesis genes of A. kteniophylla and their expression levels in leaves and roots
In A. kteniophylla, we detected a larger number of unigenes over 1 kb in length compared with its congener, Ardisia crenata (Yang, 2015); 23,568 versus 14,659, respectively. The difference in our results might be due to the availability of more data within public databases in contrast to six years ago when the prior study was published and F I G U R E 6 Gene Ontology annotations of DEGs in Ardisia kteniophylla. Terms shown on the x-axis are one level below the highest-level terms: biological process (BP), cellular component (CC), and molecular function (MF). Bold type and asterisks indicate the three terms with the largest number of DEGs overall to improvements in transcriptome sequencing capabilities (Stark et al., 2019). The long lengths of unigenes generated in this study suggest an overall high-quality transcriptome (Wang, Feng, et al., 2010;Wang et al., 2012;Wang, Luan, et al., 2010) that is likely responsible for our 100% success rate at annotating all 52,454 unigenes that we recovered.
We detected 2737 DEGs between leaves and roots of A. kteniophylla and mapped these to117 pathways in the KEGG database ( Figure 7). Of the 2737 DEGs, 108 (3.95%) were mapped to the "plant hormone signal transduction" pathway (KO:04075), and this was the largest number of genes mapped to any one pathway. Notably, genes participating in signal transduction may induce the expression of key enzymes that are involved in the production of secondary metabolites (Arimura et al., 2000), such as triterpene saponins (Yendo et al., 2014). Several specific signal transduction pathways influencing the production of triterpene saponins have been studied in medicinal species of Panax, and signaling mechanisms in those species include calcium, ethylene, and nitric oxide, and reactive oxygen species (Rahimi et al., 2015). However, more research is needed to directly connect the signal transduction enzymes that we detected in the roots and leaves of A. kteniophylla to levels of accumulation of triterpene saponins.
Our results show that the two pathways for generation of 2,3-oxidosqualene from isoprene units occur differentially in the leaves and roots based on DEGs. Within the roots, the MVA pathway is more active in the generation of 2,3-oxidosqualene from isoprene units, while, in leaves, the MEP pathway is more active. These findings were consistent with previous studies, which have shown that genes representing the MVA pathway are most highly expressed in the radicle, hypocotyl, roots, flowers, and seeds, whereas genes of the MEP pathway are more active in leaves (Vranová et al., 2013). This is likely because the MEP pathway is confined to plastids (Lichtenthaler, 1999;Rodríguez-Concepción et al., 2004), which are usually most abundant in the leaves as chloroplasts. Moreover, while the MEP pathway may be technically operational within plastids that have not been exposed to light, that is, etioplasts, precursory components for initiating the MEP pathway appear to be lacking in dark environments, thus limiting the utility of this pathway in roots (Nagata et al., 2002).
Differentially expressed genes in the downstream part of the triterpene saponin biosynthesis were, overall, more highly expressed in roots compared with leaves (Table S4). This is consistent with higher concentrations of triterpene saponins within the roots of A. kteniophylla based on prior studies using biochemical assays (Wei, 2017) and our own results ( Figure 1). Moreover, prior  et al., 2005), even in cases where the leaf is the organ preferred for pharmacology (e.g., Hedera helix, Araliaceae; Sun, Li, et al., 2017;Sun, Jiang, et al., 2017).
Notably, roots may often be the final site of biosynthesis and storage of triterpene saponins (Sawai and Saito, 2011), while earlier,  (Han et al., 2012;Rai et al., 2016).
Overall, plant roots have often been shown to accumulate higher concentrations of secondary metabolites, such as triterpenoid saponins, and this may be due to the complex nature of the soil environment, where roots are forced into close contact with potentially harmful microbes and pests, thus necessitating strong defenses (Su et al., 2005).

| Expression of triterpenoid saponin biosynthesis genes of A. kteniophylla under exposure to SA and MeJA plant hormones
We found that exogenous application of the signal transduction hormones, SA and MeJA, yielded an overall decrease in the content of triterpene saponins compared with the CK (Figure 1).
Research in other plant groups has shown that exogenous application of MeJA can increase the production of triterpenoid saponins compared with control plants, such as in Centella asiatica (Apiaceae; Mangas et al., 2006). Similar results were found in Nigella sativa (Ranunculaceae) under exogenous application of SA (Elyasi et al., 2016). However, both hormones are known to limit plant growth, potentially resulting in negative interactions between growth-related pathways and production of secondary compounds (Ellis & Turner, 2001;Rudell et al., 2002). The degree to which plants are able to both increase secondary compound production while plant growth is stunted under SA or MeJA is likely taxon-specific. For example, in Ginkgo biloba, the production of terpene trilactones was increased under SA application even while genes involved in photosynthesis (and, thus growth) were downregulated (Ye et al., 2020), and this is in contrast to our There is a known antagonistic relationship between P450 and the effects of MeJA on the production of secondary metabolites involved in defense in Arabidopsis (Lee et al., 2011), which can be further explored in future studies within A. kteniophylla. Overall, exogenous application of SA or MeJA to leaves of A. kteniophylla does not improve the production of the pharmacologically valuable triterpenoid saponins in the species, suggesting that more work is needed to understand the complexities of the biosynthesis pathway for these compounds.

| Transcription factors families that may regulate triterpenoid saponin biosynthesis in A. kteniophylla
Our analyses revealed that TFs previously reported present in

| CON CLUS IONS
In summary, in this study, we performed transcriptomic analysis to investigate the genetic basis for accumulation of triterpene saponins in roots and leaves of A. kteniophylla. We generated a large dataset of unigenes and successfully annotated these for structure and function based on several public databases. Our comparative analyses revealed that many genes involved in triterpene saponin biosynthesis have higher expression in roots than leaves, and roots are the primary organ used pharmacologically in traditional Chinese medicine. Our transcriptomic data provide a valuable genetic resource for this plant and may facilitate molecular breeding to meet demand for the medicinally bioactive triterpene saponins of the species.

CO N FLI C T O F I NTE R E S T
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Illumina short reads are available from NCBI as PRJNA675388. All other raw data are included as supporting information.